United  States.  Naval  Academy  - Trident  Scholar  project  report,  no.  75  (1976) 


(J&l  FREQUENCY  ^NALYSIS  RISING  A MINICOMPUTER.  y 

Scholar  Proj  ec^Report  ^ 


Midshipman  D.  C./Boch,  Class  1976 
U.  S.  Naval  Academy  ^ 


Annapolis,  Maryland 


Accepted  for  Trident  Scholar  Committee 


C.  W.  RECTOR,  Chairman 


F.  y.  EBERHARDT,  Professor 
Elec/ttfical,  Engineer ing  Department 


Hr  M.  NEUSTADT,  Associate  Professor 
Electrical  Engineering  Department 


D.  A.  WRIGHT,  MAJ,  CAE/ 

Electrical  Engineering  Department 


I 


\| 


ABSTRACT 


The  spectral  analysis  of  waveforms,  whether  these  waves 
arc  acoustic  or  electrical  in  nature,  has  evolved  into  an 
important  aspect  of  quite  a wide  variety  of  scientific 
endeavors.  Utilized  primarily  by  the  Navy  in  the  study  of 
underwater  sound,  frequency  analysis  also  finds  utility  in 
research  on  mechanical  vibrations,  speech,  music,  et  cetera. 
Real-time  capability  is  necessary  for  many  of  these  appli- 
cations. That  is,  the  transformation  must  be  completed  with- 
in the  time  interval  over  which  its  sampled  data  is  acquired 
so  that  spectral  plots  may  be  generated  on  a continuing  basis. 

The  basis  for  a computer-aided  frequency  analysis 
scheme  is  known  as  a Discrete  Fourier  Transform,  or  DFT. 

The  inherent  flexibility  of  a general  purpose  computer  lends 
itself  quite  well  to  the  implementation  of  a high  speed 
adaptation  of  the  DFT,  called  a Fast  Fourier  Transform  or 

Tlvis  study  was  initiated  with  the  aim  of  determining 
the  fundamental  capabilities  and  limitations  of  one  mini- 
computer in  the  development  of  a spectral  plot.  Included 
within  this  general  topic  were  two  fundamental  goals: 

1)  Development  of  the  fastest  FFT  possible  utilizing 
only  the  basic  capabilities  of  the  PDP-8/E  general  purpose 
minicomputer.  Successful  completion  of  this  objective 
brought  the  machine's  fundamental  limitations  for  real-time 
spectral  analysis  into  clear  perspective.  


2)  Investigation  of  the  feasibility  of  an  external 
hardware  "add-on"  which  would  considerably  increase  the 
spectral  analysis  capabilities  of  the  computer.  Proving 
worthwhile,  the  design  of  such  hardware  was  launched  but 
not  completed,  due  to  time  restrictions. 

As  a result  of  this  study,  it  became  apparent  that  all 
three  systems  (DFT,.FFT,  FFT  hardware)  are  capable  of 
operation  in  real  time.  However,  the  upper  frequency  limit 
for  the  DFT  would  be  approximately  50  Hz,  for  the  FFT,  500 
Hz,  and  for  the  FFT  hardware,  2500  Hz. 
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CHAPTER  ONE: 

THE  DISCRETE  FOURIER  TRANSFORM 
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The  first  step  towards  developing  a frequency  analysis 
system  on  the  PDP-8/E  minicomputer  involved  the  Discrete 
Fourier  Transform,  or  DFT.  As  the  ultimate  goal  of  the 
research  was  to  develop  a real-time  analysis  capability,  it 
was  felt  that  the  DFT  would  be  the  logical  starting  point. 
This  is  so  primarily  for  two  reasons.  First  of  all,  the  DFT 
is  the  basis  for  the  Fast  Fourier  Transform  (FFT)  algorithm. 
The  FFT  was  to  be  the  main  course  of  study  for  this  analysis. 
Secondly,  the  DFT  is  relatively  simple,  allowing  the  oppor- 
tunity to  become  thoroughly  familiar  with  the  minicomputer, 
while  at  the  same  time,  to  gain  a solid  foothold  in  the 
understanding  of  the  frequency  analysis  problem. 

The  basic  approach  used  in  the  development  of  the  DFT 
software  was  primarily  very  cautious.  In  other  words,  in 
order  to  avoid  any  overflow  problems  within  the  programming, 
triple  precision  (36  bit  word)  arithmetic  was  used.  Like- 
wise, as  a full  field  (4096  points)  was  the  simplest  to 
implement,  the  first  DFT  was  built  around  this  idea. 

The  Fourier  Transform  itself  is  actually  quite  simple. 


where 


N- 1 i QnZ 

So  = E Xn£ 

1 n=0 

Y 

n = "amplitude"  of  a given  data  point. 

Since  the  computer  can  not  distinguish  a complex  number  from 
a real  number,  the  term 

2tt 
i N 

eJ  must  he  represented 

as  cos  + j sin  This  necessitates  a trigonometric 

table  to  be  computed  and  stored  in  the  PDP-8/E's  memory. 

As  N = 4096,  a 4096  point  SINE  curve  was  written  into 
memory,  and  the  COSINE  terms  were  determined  by  a simple  90° 
(1/4  field)  phase  shift  of  the  SINE  table. 

Due  to  the  uncertainty  of  the  magnitude  of  each  S^  term 
which  would  be  formed  and  of  the  degree  of  significance 
which  should  be  maintained  throughout  each  calculation, 
triple  precision  arithmetic  was  developed  and  utilized  at 
all  stages  of  summation.  Thus,  the  possibility  of  any  over- 
flow was  virtually  eliminated.  Overflow  is  the  result  of 
tne  addition  of  two  numbers  whose  sum  is  greater  than  the 
largest  word  the  machine  can  handle,  in  this  case,  2^  or 
4096.  Hence  the  sum  3874  + 2521  would  not  be  6395.  Rather, 
it  would  be  represented  as  6395  - 4096  = 22991Q,  which  is 
6395,  modulo  4096. 

When  concerned  with  both  positive  and  negative  numbers 
in  the  minicomputer,  the  seemingly  simple  triple  precision 
arithmetic  can  appear  somewhat  confusing  upon  application. 
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Double  precision  arithmetic  is  available  in  the  Extended 
Arithmetic  Element  installed  on  the  PDP-8/E.  Thus,  only 
the  inclusion  of  a third,  highest  order  word  poses  a prob- 
lem. Consider  a double  order  case.  The  same  principles  are 
involved  for  triple  order  precision.  If  a single  precision 
word,  B,  is  added  to  the  double  precision  word  A^jA^q,  four 
cases  exist  which  require  an  action  on  the  high  order  word 
A^i . These  cases,  as  well  as  the  necessary  actions,  appear 
in  Figure  1. 
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if  Link  set,  Add  1 to  Ahi 

~< 0 

>0 

if  Link  not  set,  Subtract 

<0 

<0 

no  action 

Figure  1: 

Tripl 

e Precision  Operations. 

After  approximately  one  and  a half  months,  the  DFT  was 
fully  operational.  In  order  to  determine  its  effectiveness, 
several  waveforms  were  sampled  by  the  minicomputer's 
Analog-to-Digital  (A/D)  converter.  The  4096  sample  points 
were  stored  in  core  and  the  transform  begun.  In  order  to 
assure  that  the  software  was  running  properly,  the 
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resultant  for  each  frequency  was  displayed  on  an  oscil- 
loscope as  that  Si  was  formed.  As  a result,  the  transform 
could  be  seen  "growing,"  which  was  rather  impressive  to 
watch,  if  nothing  else.  The  DFT  required  nearly  five 
minutes  to  run  to  completion.  This  run-time  is  based  on 
the  computation  of  only  512  frequencies  of  the  4096  possible. 

The  first  test  waveform  was  a simple  100  Hz,  10  volt 
peak-to-peak  sine  wave.  The  energy  distribution  appears  in 
Figure  2.  Note  that  the  horizontal  axis  runs  from  zero  to 
512  Hz.  The  first  "blip"  at  the  start  of  the  display  is  a 
sync  pulse  used  to  trigger  the  oscilloscope. 
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Figure  2 : DFT  of  a 100  Hz  Sine  Wave 

As  can  be  seen,  the  point  with  the  highest  magnitude 
occurs  at  the  frequency  of  the  sinusoidal  input,  as  expected. 
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The  next  logical  test  was  a 100  Hz,  10  volt  peak-to- 
peak  square  wave.  The  resultant  energy  distribution 
appears  in  Figure  3. 


^ ' . . A 


Figure  3:  DFT  of  a 100  Hz  Square  Wave 

Energy  distributions  for  other  waveforms  are  shown  in 
Appendix  A. 

The  ability  to  choose  a minimum  frequency  for  the  DFT 
to  operate  on  is  probably  the  greatest  advantage  of  the  DFT 
For  example,  if  one  were  only  interested  in  frequencies 
greater  than  twenty  Hertz,  the  minimum  frequency  would  be 
chosen  as  twenty  Hertz,  allowing  532  Hz  as  the  upper  fre- 
quency in  one  512  point  pass  of  the  program.  In  addition, 
if  the  frequency  band  of  interest  were,  for  example,  125  - 
185  Hz,  the  minimum  frequency  could  be  set  at  125  Hz  and 
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the  program  could  be  halted  after  the  first  sixty 
calculations.  In  other  words,  the  program  would  only  have 
to  run  for  60/512  X five  minutes  in  order  to  analyze  this 
particular  band. 

The  primary  disadvantage  of  the  DFT  is  its  lack  of 
speed,  especially  when  considering  the  entire  spectrum. 
Considering  a one  second  data  acquisition  time  and  perhaps 
a one  second  display  time  (at  the  very  greatest) , a trans- 
form which  takes  five  minutes  to  run  is  wholly  unsuitable 
for  any  real-time  application. 

The  DFT  proved  to  be  a very  effective  learning  tool. 
The  extensive  amount  of  time  spent  debugging  the  program 
later  proved  valuable  when  debugging  the  Fast  Fourier 
iransform  software.  Also,  the  actual  use  of  the  machine's 
A/D  and  D/A  converters,  real-time  clock  and  Extended  Arith- 
metic Element  was  a better  way  to  learn  of  their  use  than 
were  the  handbooks.  As  the  system  proved  absolutely  incap- 
able of  running  in  real-time,  further  pursuit  of  the  DFT, 
was  not  considered  worthwhile.  However,  it  became  obvious 
while  running  test  waveforms  through  the  system  that  a 
logarithmic  scaler  on  the  output  data  would  prove  very  use- 
ful. On  further  investigation,  the  idea  was  found  to  be 
infeasible  in  that  nearly  all  low  order  significance  would 
be  destroyed.  Hence,  it  too  was  abandoned,  and  work  was 
started  on  the  FFT . 
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CHAPTER  TWO: 

THE  FAST  FOURIER  TRANSFORM 

The  Fast  Fourier  Transform  (FFT)  was  begun  in  the  hope 
that  it  would  significantly  decrease  the  amount  of  time 
required  to  process  any  given  waveform.  The  equations 
involved  are  basically  the  same  as  those  used  for  calcu- 
lation of  the  DFT,  However,  no  transform  is  actually 
calculated,  in  the  sense  of  the  DFT.  The  FFT  used  (there 
are  several  different  types)  is  quite  simple  on  the  surface. 
Basically,  what  occurs  is  successive  reduction  of  one  DFT 
into  two  smaller  DFT's  until  the  point  is  reached  where  a 
two  point  DFT  is  divided  down  into  two  single  point  DFT's. 
Actually,  these  single  point  DFT's  are  the  relative  energy 
proportions  for  a particular  spectral  line.  An  example  of 
this  reduction  process  for  an  eight  point  FFT  is  shown  in 
Figure  4. 

While  the  DFT  is  characterized  as  requiring  N oper- 
ations to  perform,  the  FFT  has  the  advantage  of  only 
needing  to  perform  Nlog2N  operations,  where  N is  the  number 
of  points  in  the  transform.  The  DFT  calculations  were  laid 
out  in  the  previous  chapter.  The  FFT  calculations  are  best 

% a. 

illustrated  by  Figure  5 . Za  and  are  both  complex 
numbers;  e01^  can  be  expressed  as  cosmG  + j sinm0 . The  use  of 
sine  and  cosine  necessitates  the  compilation  of  a trigono- 
metric table  for  storage  in  the  minicomputer.  The  basic 


Figure  4:  Reduction  Process  Involved 

in  an  liight  Point  FFT.1 


Figure  5:  A Demonstration  of  the  Formation 

of  Two  Points  Required  for  the  Division  of  One  DFT 
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FFT  calculation  can  be  expanded  as  follows. 

% 

Assuming,  Zfl  = A+jB  Zb  = C+jD 

cos  m0  + j sin  me  = X+jY  then, 

Za  + Z5  = (A+C)  = A' 

♦ 

j(B+D)  = jB' 

(Za-Zb)emj0  = [ CA-C)  X - (B-D)Y]  = C' 

+ 

j [ (A-C) Y + (B-D)X]  = jD'  . 

Thus,  the  mathematics  portion  of  the  FFT  program  forms  the 
four  terms  A',  B",  C',  D'.  These  terms  are  then  combined  as 

A'  + j B ' = Za+1 

% 

C.'  + jD'  = Zb  + i to  form  the 

new  set  of  DFTs,  which  will  then  be  ready  for  division  into 
two  other  DFT's. 

Choosing  which  two  complex  numbers  are  to  be  combined 
in  the  manner  of  Figure  5 is  perhaps  the  most  difficult 
portion  of  the  FFT  algorithm  to  understand.  In  the  example 
of  Figures  4 and  6 , the  first  word  in  each  of  the  two 
four  point  DFT's  is  formed  through  combination  of  f0  and  f4. 
In  this  case,  f o = A+jB  where  B = 0 and  f4  = C + jD  where  D=0. 
Then,  aQ  = A+C  + j (o)  and  b0  = (A-C)  x + j(A-C)Y.  Likewise, 
aj  and  bj  are  formed  by  properly  combining  fj  and  fs.  The 
first  position  in  the  first  two  point  DFT  is  formed  through 
combination  of  aQ  and  &2,  and  so  on.  The  manner  of  indexing 


is  explained  further  in  the  next  chapter. 

Note  that  the  subscripts  on  the  frequency  components 
are  out  of  order.  Careful  observation  shows  that  the  sub 


scripts  are  in  a hit -reversed  order.  That  is,  I 


position  l (4=1002,  1 = 001 2).  F(,  is  in  position  3 (6=1102 

3=0 1 1?)  , and  so  on. 


Figure  6 : Method  of  Combining 

Expressions  for  the  FFT.2 


hit  reversal  is  easily  and  uniquely  accomplished  by  use  of 
the  buffered  I/O.  The  subscript  (actually  a core  address) 
is  put  out  on  the  I/O  lines,  reversed  by  means  of  a cable 
and  put  back  in  on  the  I/O  line.  The  contents  at  the 


11 


i 


original  core  address  are  then  placed  in  the  correct,  bit- 
reversed  core  address. 

Two  major  obstacles  were  encountered  in  the  develop- 
ment of  the  TFT.  The  first  concern  was  what  degree  of  sig- 
nificance should  be  maintained  in  the  calculations.  The  DFT 
indicated  that  double  precision  arithmetic  would  be  most 
advantageous.  By  using  double  precision,  round-off  errors 
(actually  truncation  errors)  would  be  eliminated  when  per- 
forming a multiplication.  Multiplying  two  twelve  bit  numbers 
using  the  Extended  Arithmetic  Element  produces  a 24  bit 
(double  order)  result.  If  only  single  precision  were 
required,  the  lower  order  twelve  bits  would  be  dropped. 

Through  this  process,  the  significance  of  the  lower  order 
12  bits  is  obviously  lost.  In  addition,  any  product  less 
than  1 OOOOg  (4096)  becomes  zero. 

On  the  other  hand,  double  precision  addition  and  store 
operations  require,  obviously,  twice  as  many  instruction 
cycles  within  the  computer  and,  therefore,  twice  as  much 
time.  Also,  a double  precision  store  will  not  clear  the 
accumulator,  as  will  a single  precision  store.  Further,  the 
amount  of  core  required  for  data  storage  would  be  twice  as 
great  for  double  precision  operations  as  for  single  precision. 

As  the  primary  concern  of  this  research  is  the  time 
required  to  run  a complete  transform,  single  precision 
arithmetic  was  chosen.  The  time  advantage  gained  through 
the  use  of  single  precision  greatly  offsets  the  significance 
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lost  through  multiplication.  Likewise,  the  extra  core 
space  available  allows  a transform  of  512  points  to  be 
located  in  one  data  field.  With  double  precision  arithmetic, 
the  largest  transform  which  can  be  run  in  one  data  field  is 
only  half  the  size  of  the  aforementioned  transform.  As  the 
range  ol  frequencies  to  be  analyzed  depends  primarily  upon 
the  number  of  sampled  data  points,  the  use  of  single  pre- 
cision operations  affords  the  capability  to  examine  a larger 
spectrum . 

The  second  major  problem  became  apparent  when  consid- 
ering how  to  avoid  overflow  problems  when  not  using  double, 
or  even  triple,  precision  arithmetic.  Obviously,  some  sort 
of  scaling  scheme  had  to  be  devised  where  overflow  could  not 
possibly  occur.  With  a simple  addition,  dividing  each 
operand  by  two  before  adding  will  result  in  no  overflow 
occurring.  As  an  example,  consider  (with  unsigned  numbers) 

6 0 0 0 8 + 6 0 0 0 g = 1 4 0 0 0 g . 

In  this  case,  overflow  does  indeed  occur  (the  1 in  the  sum 
is  the  thirteenth  bit).  Now,  if  each  operand  is  scaled  as 
suggested , 


6 0 0 0 g 
" 2 


600°g 


6 0 0 0 g • 


Since  all  operands  arc  scaled  before  addition  (or  subtraction) 
the  bOOOg  which  is  the  sum  has  a somewhat  different  meaning 
than  the  bOOO«  which  is  an  operand.  In  other  words,  before 
addition,  the  operands  have  become  3 0 0 0 g . Hence  it  is  easy 
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to  see  that 

3000g  + 3000g  = 6000g  . 

This  same  divide-by-two  scale  factor  also  applies  it- 
self in  the  formation  of  the  term 

[ (A- C) X - (B-D)Y]  + j [ (A-C) Y + (B-D)X]  , 

which  can  be  represented  as 

I za  • zbl  I wm I Where 
Za  = A+jB,  Zb  = C+jD,  and  wm  = Xm  + jYm  . 


When  the  terms  (A-C)  and  (B-D)  are  scaled  in  the  same  manner, 
the  relation  essentially  becomes 


^ % 


Based  on  the  fact  that  |wm|  is  unity  and  the  fact  that 

'X,  'Xj  1 

I za  “ zb | , y _ y 

1 j 1 1 (considering  |Za|  and  | Z^  | to  always  be  bounded 


above  by  unity) , it  is  quite  obvious  that 


<Xj  'V 

za  * zb(  ^ 


I wm  I < 1 • 


% 


Since  | Za  + Z^ | < |Za|  + | , it  follows  that 


'Xj  T, 

, Za  " Zb  . . -V  . , ‘Xj  . . 'Xj  , 

I 2 I I wm I i I z a i or  I zb I • 


Thus,  a scale  down  factor  of  two  cannot  cause  overflow  when 
going  from  one  vector  to  another.  By  the  nature  of  the 
data  acquisition  techniques,  the  terms  in  the  first  vector 


14 


have  a magnitude  less  than  or  equal  to  unity.  That  is,  the 
magnitude  of  the  real  components  are  limited  to  unity  and 
the  magnitude  of  the  imaginary  components,  naturally,  are 
zero.  Therefore  as  there  is  no  overflow  in  going  from  the 
zeroeth  vector  to  the  first,  there  can  be  no  overflow  any- 
where else,  as  just  demonstrated. 

The  data  acquisition  software  is  such  that  the  data  are 
not  scaled  from  0 to  1,  rather  a scale  of  0 to  3777g  is 
used.  The  A/1)  converter  was  set  up  so  that  a ten  volt  signal 
would  generate  the  number  3777g  and  a negative  ten  volt 
signal  would  generate  the  number  4000g.  In  the  minicomputer 
architecture,  the  numbers  0 through  3777g  are  considered 
positive  (3777g  being  the  most  positive),  and  the  numbers 
4000g  through  7777g  are  considered  to  be  negative  (4000g 

4 

being  the  most  negative). 

It  turns  out  that  such  a scaling  scheme  is  optimum. 

As  the  foregoing  discussion  points  out,  a scale  factor 
greater  than  two  would  be  unnecessary.  The  result  would  be 
a steady,  almost  drastic  decrease  in  the  magnitude  of  the 
contents  of  the  data  vectors.  A scale  factor  less  than  two 
would  surely  be  insufficient.  Consider  a worst-case  situ- 
ation. If  a ten  volt  D.C.  waveform  were  read  in  as  data, 
the  transform  should  produce  a spectral  line  at  zero  Hz , 
of  magnitude  3777g.  Since  the  zero  Hz  component  is  found 
by  repeated  additions  of  A and  C,  scaled  by  two,  the 
eventual  result  will  be  3777g.  That  is 
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3777„  3777q 

2 8 ♦ -^“8  = 3777g  • 

This  would  continue  for  all  points  in  the  data  vector  (all 
of  which  have  value  3777g).  Thus  the  end  result  would 
indeed  be  a spectral  line  at  zero  Hz  of  magnitude  3777g.  A 
scale  factor  of  less  than  two  would  therefore  result  in 
repeated  overflow  problems. 

The  evidence  presented  in  the  foregoing  discussions 
indicate  that  a divide-by- two  scale  factor  is  optimum.  How- 
ever, one  other  scaling  difficulty  results  from  the  trun- 
cation of  the  product  formed  through  the  MUY  (multiply) 
operation.  Perhaps  the  best  way  to  describe  this  is  by 
example.  Consider 

3777g  X 3777g  Approximating, 

4000g  X 4000g  = 2000  0000  g . 

Truncating  results  in  the  product  2000g.  Therefore,  scaling 
the  product  up  by  2 results  in  the  proper  answer  4000g,  or 
more  precisely,  3776 g.  The  multiplication  of  two  positive 
numbers  should  result  in  a positive  product.  As  3777g  is 
the  largest  possible  positive  number,  the  result  can  be  no 
"larger”  than  3777g. 

In  order  to  overcome  this  difficulty  without  adding  any 
additional  time  delays  in  the  programming,  the  trigonometric 
tables  were  scaled  on  the  basis  of  "+"  7777g  rather  than 


+ 3777g.  Thus,  by  permanently  scaling  the  multiplicands, 
there  was  no  need  to  repeatedly  scale  the  products.  There 
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is  an  apparent  deviation  from  the  sign  conventions  used  by 
the  minicomputer.  However,  as  the  multiply  operation  works 
with  unsigned  numbers,  another  method  of  determining  the 
sign  of  the  trigonometric  values  needs  to  be  utilized.  This 
method  is  described  in  the  next  chapter. 

The  trigonometric  tables  discussed  above  differ  in  yet 
another  way  from  those  used  with  the  DFT  software.  The 
number  of  spectral  lines  which  can  be  calculated  is  limited 
by  the  sampling  rate  and  the  transform  size.  For  example, 
with  a sampling  rate  of  4096  points  per  second  and  a 512 
point  transform,  the  Nyquist  frequency  is  2048  Hz.  Because 
of  the  way  the  l-'FT  algorithm  is  designed,  the  first  256 
spectral  lines  show  the  energy  proportions  up  to  2048  Hz  in 
discrete  steps  of  8 Hz.  The  second  set  of  256  spectral 
lines  is  the  exact  mirror  image  of  the  first  set.  The 
nature  of  the  FIT  is  such  that  the  first  referenced  point  in 
one  of  the  trigonometric  tables  after  the  180°  point  has  the 
same  value  as  the  first  point  in  the  table,  and  so  on. 
Therefore,  only  half-wave  tables  are  required  for  the  FFT. 

The  same  is  true  for  the  DFT,  except  that  with  the  DFT,  there 
is  a gain  in  transform  speed  when  separate  tables  are  used. 
Figure  7 shows  the  combined  effects  of  the  "forced  full- 
scaling" and  the  use  of  half-wave  tables. 

Due  to  the  very  nature  of  the  PDP-8/E,  there  exists 
what  is  known  as  a "paging"  problem.  The  method  of  memory 
addressing  within  the  computer  results  in  the  capability  to 
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only  address  directly  128  locations  in  core.  Those  loca- 
tions not  within  a particular  128  point  "page"  must  be 

3 

addressed  indirectly.  As  a result  of  this,  the  entire  FFT 
program  cannot  be  stored  so  that  a simple  loop  will  be  run. 
Rather,  the  mathematics  portion  of  the  program  must  be 
indirectly  accessed  as  a subroutine  by  the  index  controller 
program.  The  indexing  program  maintains  the  proper  indirect 
address  (to  be  explained  later)  of  the  data  and  trigonometric 
tables  for  use  by  the  mathematics  loop. 


Figure  7 : FFT  Trigonometric 

Tables  as  Stored  in  Core. 

The  combination  of  the  inherent  speed  of  the  FFT 
algorithm  and  those  time-saving  techniques  discussed  earlier 


in  this  chapter  result  in  a 512  point  transform  running  to 
completion  in  0.58  seconds.  Though  not  quite  a fair  com- 
parison (considering  the  relative  size  of  the  I)FT)  with  the 
Dl* T , there  is  obviously  a tremendous  time  advantage  with 
the  FFT.  In  fact,  the  goal  of  real-time  processing  has 
been  realized.  Considering  a one  second  data  acquisition 
time,  the  FFT  is  capable  of  performing  the  transform  in 
sufficient  time.  However,  as  was  discussed  earlier,  the 
results  of  the  transform  are  not  in  a readily  usable  form. 
The  operations  that  must  be  performed,  including  display, 
require  approximately  .3  seconds  of  processor  time.  Adding 
this  time  and  the  .58  seconds  required  for  the  actual  FFT, 
and  the  sum  easily  nears  the  one  second  that  would  allow 
for  real-time  processing.  Real-time  as  discussed  here 
actually  means  being  able  to  analyze  and  display  data  within 
the  time  it  takes  to  acquire  new  data.  A sampling  rate  of 
4096  points  per  second  will  take  exactly  one  second  to  fill 
one  data  field.  Thus,  the  performance  of  the  FFT  could 
easily  be  completed  in  time  for  real-time  for  FFT's  of  size 
512  and  below  (given  the  one  second  acquisition  time),  wide 
enough  range  for  most  underwater  sounds.  Large  FFT's  could 
not  be  completed  in  sufficient  time  to  apply  to  real-time 
cons iderations . 

With  respect  to  the  DFT , the  one  major  drawback  of  the 
FFT  algorithm  is  its  inability  to  be  selective  insofar  as 
what  frequency  band  is  to  be  analyzed.  The  transform  must 
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calculate  the  energy  proportions  from  zero  Hz  up  to  the 
Nyquist  frequency.  There  does  exist  however,  a variation 
on  the  FFT  which  allows  for  a "zooming  in"  on  a specified 
frequency  band.  This  variation  is  beyond  the  scope  of  this 
report . 

Associated  with  the  actual  FFT  program  is  a set  of 
software  which  serves  to  organize  the  output  of  the  FFT. 

In  addition,  it  also  acquires  data,  allows  the  user  to 
select  a desired  portion  of  that  data  for  analysis,  and 
transfers  it  to  FFT’s  data  vector.  The  data  acquisition 
routine  allows  for  a user  variable  sampling  rate  and  for 
display  of  the  acquired  waveform  on  an  oscilloscope.  Also 
incorporated  into  the  acquisition  routine  is  a movable  data 
"window".  As  set  up,  the  window  allows  for  the  viewing  of 
the  data  512  (or  any  other  number,  depending  on  the  desired 
transform  size)  points  at  a time.  The  speed  at  which  the 
window  slides  through  the  data  field  is  controlled  by  the 
setting  on  the  computer's  switch  register.  Once  the  desired 
512  points  are  chosen,  the  window  is  stopped  and  that  data 
is  loaded  for  operation  by  the  FFT. 

A second  portion  of  the  aforementioned  software  package 
performs  the  squaring,  summing,  and  square-rooting  opera- 


tions on  the  results  of  the  transform.  Once  completed,  the 
bit-reversing  routine  takes  over  and  unshuffles  the  fre- 
quency components  and  displays  the  final  result  on  the 
oscilloscope.  Examples  of  the  transform  results  and  both 
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the  shuffled  and  unshuffled  spectral  lines  appear  in 
Appendix  B. 

The  TFT  performed  quite  well  with  the  exception  of  a 
small  amount  ol  processing  noise.  There  are  numerous 
causes  ol  noise  ol  this  type,  most  of  which  are  unavoidable. 
Volumes  have  been  written  concerning  noise,  therefore  it 

5 

shall  not  he  discussed  here.  Several  examples  of  trans- 
formed waveforms  appear  in  Appendix  C,  along  with  descrip- 
tions of  each. 
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CHAPTER  THREE: 
FFT  SOFTWARE 


A complete  analysis  of  the  actual  FFT  program  is  essen- 
tial for  the  understanding  of  the  difficulties  involved  and 
for  a demonstration  of  how  necessary  it  is  for  careful  pro- 
gramming techniques  (which  result  in  as  fast  an  FFT  as 
possible).  As  was  mentioned  previously,  the  FFT  is  broken 
down  into  two  areas,  arithmetic  operations  and  index  control. 
The  purpose  of  this  chapter  is  not  to  teach  the  minicomputer 
programming  language,  therefore  the  reader  is  directed  to 
Introduction  to  Programming , fourth  edition,  published  by 
Digital  Equipment  Corporation.  The  FFT  software  is  repro- 
duced in  Appendix  D. 

Before  proceeding  to  discuss  the  operation  of  the  index 
controller,  the  variables  used  will  be  defined. 

M - This  variable  is  initialized  at  2^,  which 
is  also  the  size  of  the  transform.  M is 
halved  with  each  pass  through  the  program. 

With  each  pass,  the  DFTs  of  size  M are 
all  forced  into  DFTs  of  size  M/2.  M serves 
as  1)  a "counter"  which  guarantees  that 
the  transform  goes  to  completion  (DFTs  of 
size  one  are  formed)  , 2)  an  "incrementor" 
for  the  FFT's  running  indices,  and  3)  a 
definitive  variable  which  indicates  the 
size  of  the  DFTs  being  formed. 

IMAX  - Initialized  at  one,  IMAX  is  doubled  with 

every  pass  through  the  program.  It  serves 
1)  to  initialize  ITAL  (ITAL  = - IMAX)  2)  as 
a pointer  within  the  sine  and  cosine  tables 
to  determine  0 in  eJm®. 

ITAL  - Essentially  ITAL  counts  the  number  of  DFT 
pairs  that  must  be  operated  upon  with 
every  pass  of  the  program. 
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JTAL  - 


kl-'IND  - 


RBINI)  - 


IMF  INI)  - 


I MB  INI)  - 


SW INI)  - 


CWIND  - 


JTAL  is  a counter  which  guarantees  that 
each  DFT  block  in  Figure  4 gets  filled. 

An  example  of  how  these  four  variables 
change  for  an  eight  point  FFT  is  shown 
in  Figure  8. 

This  pointer  for  the  first  operand  in  the 
real  vector  is  the  same  as  "A”  mentioned 
in  the  preceeding  chapter. 

This  variable  points  to  the  second  operand 
in  the  real  vector  and  corresponds  to  "C". 

This  variable  corresponds  to  "B"  and  points 
to  the  first  operand  in  the  imaginary  vector 

This  variable  corresponds  to  "D"  in  the 
imaginary  vector-  IMB1ND,  IMFIND,  RBIND, 
and  RFIND  are  known  as  running  variables 
and  are  initialized  by  M for  each  DFT 
run  through. 

This  variable  points  to  the  appropriate 
word  in  the  Sine  table. 

This  variable  points  to  the  appropriate 
word  in  the  Cosine  table. 


The  index  controller  has  two  functions.  The  first  is 
to  ensure  that  the  correct  data  words  are  applied  to  the 
basic  FFT  calculation  discussed  in  the  previous  chapter. 

Ns  there  is  a definite  pattern  to  the  manner  in  which  the 
indices  progress  through  the  vector,  the  indexing  problem 
is  eased  somewhat.  The  second  function  of  the  controller 
is  to  halt  the  FFT  once  all  single  point  DFTs  have  been 
formed . 
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ITAL--  "4 
I TA  (.  - -I 
I MAX  I 


xfal  * -/ 
I Mi  -4 
IM4v  * 4 


*£ 
JTAL--Z 
ITAL--  Z 
LM/U-  2 


Figure  8:  Values  of  counters  at  the  start  of 

each  pass  through  the  program.  Those 
which  start  at  values  less  than  zero  are  run  from 
that  value  to  zero. 

Examination  of  the  flowchart  in  Figure  9 will  clearly 
explain  exactly  how  the  index  controller  operates.  The 
initialization  shown  at  the  beginning  occurs  only  once  when 
the  program  is  started.  Note  that  M is  halved  at  the  start 
of  the  first  pass.  This  is  because  M indicates  the  size 
of  the  DFT  be ing  formed.  Once  M reaches  zero  (as  a result 
of  successive  rotate  right  operations)  the  program  is 
halted.  With  every  pass  through  the  program  (LPASS) , ITAL 
is  initialized  at  -IMAX.  Also,  the  running  indices  are 
initialized  to  the  start  of  the  DFTs  last  formed.  Again, 


SET  M = 2l 
SET  ’iMAX  = 1 
SET  ITAE  = 0 


M-.-M/2 

( l, PASS)  M = 0 

YES  ^ ^ NO 

DONE  ITAL  IMAX 

I I 

HALT  IMFIND— - LIM 

RFIND  * LRE 
IMBIND  LIM+M 
RBIND  - — LRE+M 

I 

•JTAL  — - - M 


SWIND  - — LSWIND 
('WIND  LCWIND 

I 

DUMP  TO  ARITHMETIC  SUBROUTINE 


SWIND- — SWIND+ IMAX 
CWIND- — CWIND+IMAX 


INCREMENT  IMF IND  (I  LOOP) 
INCREMENT  IMBIND 

(LPASS)  INCREMENT  RFIND 

INCREMENT  RBIND 


INCREMENT  JTAL 


JTAL  = 0? 

/ \ 

YES  NO 

INCREMENT  ITAL 


ITAI. 

/ 

YES 

IMAX—-  IMAX  X 2 


= 0? 

\ 

NO 

RFIND—-  RFN IND+M 
IMFIND— IMFIND+M 
RBIND  — — RBIND+M 
IMBIND  -<-IMBIND+M 


( JLOOP) 


F igure  9 : 


Flow  Diagram  for  Index  Controller 


with  every  pass  through  the  program,  M is  halved  and  IMAX 
is  doubled. 
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At  the  beginning  of  the  JLOOP,  the  trigonometric  table 
pointers  are  set  to  the  top  of  the  tables.  Then,  with 
every  pass  through  the  arithmetic  subroutine,  these  same 
pointers  are  incremented  by  the  value  of  IMAX.  As  the 
arithmetic  subroutine  only  performs  one  basic  FFT  calcu- 
lation each  time,  the  running  indices  are  incremented  by 
one  so  as  to  point  to  the  next  set  of  operands.  ITAL  is 
then  incremented  by  one  and  checked  to  determine  whether  or 
not  each  new  DFT  are  filled.  If  they  are  not,  another  pass 
through  the  arithmetic  subroutine  is  called  for.  Otherwise, 
the  next  pair  of  DFTs  are  started  on  as  the  JLOOP  is 
entered  again.  Once  each  complete  vector  transformation  is 
finished,  another  LPASS  is  begun  and  the  process  repeats. 

A timing  analysis  yields  the  following  results. 

1)  The  ILOOP,  excluding  the  arithmetic  subroutine, 
requires  22.8  ysec  of  processor  time. 

2)  The  JLOOP  (without  the  ILOOP)  requires  53.0psec. 

3)  The  initial  setup  and  LPASS  take  54.0psec  to 
complete . 

The  LPASS  routine  is  run  through  L-l  times.  The  JLOOP  is 
run  through  first  one  time,  then  twice,  then  four  times, 
and  so  on  up  to  2L‘*  times.  The  ILOOP  is  run  through  2L_1 
times  for  each  LPASS.  All  totaled,  the  time  required  for 
the  index  controlling  operations  is  insignificant  compared 
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to  that  required  for  the  arithmetic  subroutine. 

That  portion  of  the  FFT  software  which  performs  the 
actual  calculations  is  called  the  arithmetic  subroutine.  A 
description  of  these  calculations  has  already  been  pre- 
sented. A discussion  of  how  they  are  done  does  not  require 
a I lowchart,  however,  it  may  be  helpful  to  refer  to  the 
proper  portion  of  the  software  listed  in  Appendix  D.  The 
subroutine  starts  at  line  number  1030  which  is 
0000  0400  1030  FFTMATH  0. 

Lines  1040  through  1110  are  concerned  with  setting  up  a 
scries  of  "flags"  used  to  check  for  the  sign  of  a given 
number.  As  the  multiply  operation  only  works  with  unsigned 
numbers,  the  absolute  value  of  all  operands  must  be  formed 
before  multiplication.  After  multiplication,  the  product 
must  be  given  the  proper  sign  before  any  further  operation. 
The  functions  of  the  remainder  of  the  lines  appear  in  the 
following  table. 


Line  1 

Slumbers 

Funct i on 

1120 

- 1150 

Fetches  A,  scales  it  down  by 
two  and  stores  A/2  in  a 
temporary  location. 

1160 

- 1180 

Fetches  and  scales  down  (by 
two)  C. 

1 1 90 

- 1200 

Adds  A/2  and  C/2  and  stores 
the  sum  as  A'. 

1210 

- 1240 

Fetches  B,  scales  it  down  by 
two  and  stores  B/2  in  a 
temporary  location. 

1250 

- 1270 

Fetches  and  scales  down  (by 
two)  D. 
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Line  Numbers 
1280  - 1290 

1300  - 1330 

1340 

1350  - 1410 
1420  - 1440 

1450 

1 4bl)  - 1500 

1530  - 157.0 
1580  - 1610 

1620  - 1650 
1660  - 1680 

1690  - 1770 


Function 

Adds  B/2  and  D/2  and  stores 
the  sum  as  B'. 

Fetches  C,  changes  its  sign, 
and  scales  it  down  by  two. 

Adds  -C/2  and  A/2. 

Forms  |A/2  - C/21,  increments 
the  appropriate  flags  if 
A/2  - C/2  is  negative,  and 
stores  | A/ 2 - C/2 | in  a 
temporary  location. 

Checks  sign  of  cosine  by  com- 
paring the  addressed  location 
within  the  cosine  table 
against  the  table's  middle 
location.  Recall  that  the 
cosine  table  is  a half-wave 
table. 

Causes  a program  jump  to 
line  1475.  Occurs  only  if 
the  cosine  is  positive. 

Increments  the  appropriate 
flags,  fetches  the  cosine 
and  forms  the  absolute  value 
of  the  cosine.  Occurs  only 
if  the  cosine  is  negative. 

Multiplies  I A/ 2 - C / 2 I and 
| COS  | . 

Assigns  the  proper  sign  to 
the  product  and  stores  it  in 
a temporary  location. 

Fetches  D,  scales  it  down  by 
two  and  changes  its  sign. 

Adds  -D/2  and  B/2,  changes 
the  sign  of  the  sum  and  stores 
it  in  a temporary  location. 

Checks  the  sign  of  B/2  - D/2, 
increments  the  appropriate 
flags  if  it  is  negative,  forms 
|B/2  - D/2|  and  stores  it  in 
a temporary  location. 


,Wlj  LM,  III. 
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Line 

Numbers 

Function 

1780 

- 1810 

Multiplies  |B/2  - D/2 1 
SIN. 

and 

1820 

- 1830 

Assigns  the  proper  sign 
the  product. 

to 

1840 

- 18  50 

Adds  the  last  two  products 
formed  and  stores  the  sum 
as  C\ 

I8  60 

- 18  90 

Multiplies  |B/2  - D/2| 
| COS | . 

and 

1900 

- 1920 

Assigns  the  proper  sign 
the  product  and  stores 
a temporary  location. 

to 

it  in 

1930 

- I96  0 

Multiplies  |A/2  - C / 2 | 
SIN. 

and 

I 

I 


I 


I 

I 

I 


l 

I 


1970  - 1980 

Assigns  the  proper  sign  to 

T 

the  product. 

1990  - 2000 

Adds  the  last  two  products 
formed  and  stores  the  sum 

as  D ' . 

mm 

2010 

Returns  processor  control  to 

•J 

the  index  controller. 

Jl 

I’hc  remainder  of  the  FFT  software  is  a display  routine 
which  outputs  the  real  vector  and  the  imaginary  vector  on 
t w o different  D/A  channels  for  display  on  an  oscilloscope, 
fine  180  in  the  index  controller  portion  of  the  program 
causes  a jump  to  the  display  routine  after  every  complete 
vector  transformation  by  the  FFT.  Such  an  instruction 
allows  an  interesting  look  at  every  step  in  the  formation 
of  a complete  FFT.  Photographs  of  each  step  of  an  FFT 
performed  on  a simple  one  Hz  square  wave  appear  in  Appendix 
B. 
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Nearly  all  of  the  time  required  to  generate  a complete 
FFT  is  used  up  by  the  arithmetic  subroutine  as  it  requires 
approximately  217  ^seconds  per  pass  (worst  case).  In  a 512 
point  transform,  the  arithmetic  subroutine  must  be  entered 
9 x 256  times.  Therefore,  of  the  .58  seconds  required  for 
a complete  512  point  FFT,  .50  seconds  are  spent  in  the  sub- 
routine. In  order  to  determine  how  the  speed  of  the  FFT  can 
be  improved,  it  is  necessary  to  develop  a timing  diagram. 
Such  a diagram  for  the  arithmetic  subroutine  appears  in 
Figure  10.  The  diagram  shows  that  little,  if  anything,  can 
be  done  to  improve  the  speed.  There  are  no  parts  that  take 
any  significantly  long  periods  of  time  to  run.  Numerous 
variations  of  the  program  have  been  tried  with  no  increase 
in  speed  whatsoever.  Note  that  in  summing  all  the  time 
delays,  the  worst  case  was  always  considered.  In  other 
words,  where  there  were  two  paths  to  the  same  point,  the 
slowest  path  was  always  considered.  An  examination  of 
other  possible  methods  of  increasing  the  speed  of  the  FFT 
will  be  examined  in  the  next  chapter. 

As  the  indexing  system  and  arithmetic  operations  are 
the  same  for  all  FFTs  whose  size  is  a power  of  two,  the 
basic  system  presented  herein  can  be  expanded  (within  the 
limits  of  memory)  or  contracted.  The  procedures  necessary 
for  such  alterations  are  relatively  straightforward.  How- 
ever, as  a system  expansion  requires  data  field  changes, 
and  as  data  field  changes  are  inherently  tricky,  expansion 
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10.8 

l. 

Set  up  indices. 

23.0 

| 

2. 

Form  and  store  A/2  + C/2. 

1 

23.0 
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3. 

Form  and  store  B/2  + D/2. 

1 

12.0 

/ \ 

4. 

Form  A/2  - C/2. 

\ 

3.8 

/ 

/ 

5. 

Set  up  A/2  - C/2  for  multipl 
cation. 

0.4 
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Check  sign  of  cosine. 
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\ 

6.2 
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Set  up  cosine  for  multi- 
plicat  ion . 
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9. 

Apply  proper  sign  to  product 
and  store. 
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10. 

Form  and  store  -B/2  + D/2. 

3.8 

/ \ 

11. 

Fetch  -B/2  + D/2. 
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\ 

3.8 
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12. 

Set  up  D/2  - B/2  for  multi- 
pl ication . 

17.8 

6^.4 

13. 

Multiply  | D/2  - B/2 | x | SIN | 
and  sign. 

1 

14. 

Form  and  store  C'. 

1 

19.2 

15. 

Multiply  |D/2  - B / 2 ( x | cos | 
and  sign. 

26.6 

16. 

Multiply  | SIN | x | A/2  - C/2 | 
sign,  and  form  D'. 

Total 


217  psec 


Figure  10 : FFT  Timing  diagram  showing  the 

steps  and  the  time  required  for  each. 
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is  not  recommended.  For  system  contraction,  several 
actions  must  be  undertaken. 

1)  M must  be  initialized  to  the  proper  2^,  the  size 
of  the  FFT . 

2)  MCNEG  must  be  redefined  to  be  -3200g  - 2L~^. 

3)  MSIZE  in  the  display  routine  of  the  FFT  must  be 
redefined  as  -2L. 

4)  The  trigonometric  tables  must  be  made  half-wave 
based  on  2L  points  full  wave. 

5)  The  movable  data  window  must  be  resized. 

6)  The  data  transfer  program  must  be  resized. 

7)  The  bit-reversal  routine  must  be  resized  and 
adjusted  for  the  reversal  of  the  proper  number 
of  bits  (L- 1) . 

8)  The  square,  sum,  and  square  root  program  must 
be  resized. 

The  resizing  is  necessary  so  that  the  desired  operations 
arc  performed  on  2^  words.  A discussion  of  how  to  properly 
perform  the  above  changes  is  in  the  FFT  Users  Manual  (in 
preparation) . 
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CHAPTER  FOUR: 

THE  "BLACK  BOX"  APPROACH 

Since  the  arithmetic  subroutine  of  the  FIT  requires 
such  ;i  relatively  great  amount  of  time  to  run  (..SO  seconds 
in  comparison  to  the  .58  seconds  required  for  the  entire  FFT) , 
the  feasibility  of  utilizing  an  external  device  to  perform 
the  calculations  was  investigated.  What  is  required  is  an 
extra  set  of  hardware  which  will  perform  the  basic  FFT  cal- 
culation described  earlier.  In  order  to  be  effective,  the 
external  device  must  be  capable  of  high-speed  operation  so 
that  the  added  time  advantage  will  allow  for  real-time 
analysis.  The  device  must  also  have  a direct  access  to  the 
minicomputer's  memory  for  both  the  reading  and  writing  of 
the  data. 

In  order  to  access  the  required  data  quickly  the  device 
will  have  to  do  so  directly  from  the  PDP-8's  memory.  How- 
ever, as  the  addresses  of  the  desired  data  within  memory 
change  with  every  pass  through  the  arithmetic  subroutine, 
the  device  will  have  to  access  data  in  a pseudo  - direct 
manner.  That  is,  a set  of  addresses  will  need  to  be  hard- 
wired into  the  device.  The  contents  of  memory  at  these 
locations  will  be  the  addresses  of  the  six  desired  data 
words.  The  device  will  then  store  these  addresses  in  its 
own  memory  for  future  reference.  The  data  will  then  be 
acquired  and  stored  by  the  device,  again  in  its  own  memory. 

Once  the  data  has  been  acquired  by  the  device,  program 
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control  will  be  returned  to  the  processor  for  maximum 
effective  use  of  time.  While  the  device  is  performing  the 
calculations,  the  processor  will  update  the  indices  from 
the  next  calculation.  When  finished  with  all  calculation, 
the  device  will  signal  the  processor  that  the  device  is 
ready  to  accept  control  of  the  program.  At  this  time,  the 
device  will  load  into  memory  the  four  results  of  the  basic 
calculation  into  their  proper  locations  (which  were  stored 
by  the  uv.'ce  originally).  Thus  the  device  need  only  operate 
as  quickly  as  the  computer  can  update  the  indices. 

By  operating  in  this  manner,  those  calculations  which 
were  performed  by  the  arithmetic  subroutine  in  one  half 
second  will  require  approximately 

16  x 1.2  vjsec  x 9 x 256  - .044  seconds. 

The  1.2  yseconds  is  the  length  of  one  machine  cycle  within 
the  minicomputer.  Sixteen  of  these  cycles  will  be  required 
by  the  device.  The  term  9 x 256  represents  the  total 
number  of  times  the  device  would  be  used  by  a 512  point  FFT. 

Thus  it  can  be  seen  that  what  was  once  the  slowest  part  of 
the  FFT  analysis  has  become  nearly  twice  as  fast  as  that 
which  wr  once  considered  to  be  so  fast  as  to  have  an 
almost  negligible  effect  on  the  overall  timing,  namely  the 
index  controller. 

Figure  ll  is  a block  diagram  of  the  external  hardware 

A 


device,  indicating  the  flow  of  information  between  each 
part  of  the  device.  The  figure  also  demonstrates  the 
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relationship  between  the  device  and  the  computer.  The 
operations  which  occur  within  the  hardware  are  best  outlined 
as  follows  for  a basic  understanding  of  the  function  of  the 
device . 

The  first  step  is  initiated  by  means  of  an  IOT 
instruction  within  the  FFT  index  controller.  The  purpose 
of  the  instruction  is  to  tell  the  device  that  it  is  to  set 
up  a so-called  DMA  (Direct  Memory  Access)  state  within  the 
minicomputer.  A DMA  state  allows  a device  external  to  the 
computer  to  directly  access  the  memory  of  the  computer.  The 
time  required  for  the  transfer  of  one  data  word  is  1,2  P 
seconds,  corresponding  to  one  fast  machine  cycle. 

Once  the  device  acquires  control  of  the  memory,  it 
fetches  the  necessary  six  data  words  from  memory  as  follows: 

1)  The  address  register  loads  the  first  of  the  hard- 
wired memory  addresses  onto  the  minicomputer's  MA 
(Memory  Address)  lines. 

2)  The  contents  of  memory  at  that  address  are  fed  back 
into  the  address  register  for  storage  and  readdre- 
ssing of  the  memory. 

3)  The  contents  of  memory  at  the  latter  address  are 
then  stored  in  the  input  register  as  the  first  data 
word . 

4)  The  above  three  steps  are  repeated  until  all  six 
data  words  (A,  B,  C,  D,  X,  and  Y)  are  stored  in 
the  input  register.  Note  that,  as  the  final  output 
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from  the  device  will  he  four  words  (A*,  R',  C',  and 
" ' only  the  addresses  of  A,  B,  and  P are  kept  by 
the  add  less  register. 

With  1 1 ol  the  data  read  into  t lie  device,  control  of 
t lie  memory  is  relinquished  and  returned  to  the  processor. 

Is  the  processor  updates  the  indices  for  the  next  pass 
through  the  external  hardware,  this  same  hardware  performs 
the  basic  IFT  calculation  described  in  a previous  chapter. 

first  the  term  (A+C) , appropriately  scaled,  is  formed 
and  stored  as  A'  in  the  output  register.  Similarly,  (B  + D) 
is  lormed  and  stored  as  B in  the  output  register.  Then  the 
terms  (A  C)  and  (B  P)  are  formed  and  loaded  into  the  input 
register  lor  use  by  the  multiplier.  By  use  of  the  multi- 
plier, the  terms  (A-C)X,  (A-C)Y,  (B-P)X,  and  (B-D)Y  are 
formed  and  again  stored  in  the  input  register.  Finally,  the 
\rithmetic  Logic  Unit  (ALU)  combines  these  last  four  terms 
into  (A-(’)X  - (B-P)Y  and  (A-C)Y  + (B-D)X  and  loads  each 
into  the  output  register  as  C ' and  P ' , respectively. 

As  soon  as  all  data  is  loaded  into  the  output  register, 
the  device  sets  up  a break  request,  telling  the  processor 
that  control  of  the  memory  is  again  desired.  Upon  completion 
of  the  current  instruction,  the  processor  will  relinquish 
control  of  the  memory.  Then  the  device  loads  the  mini- 
computer memory  from  the  output  register  as  follows: 

1)  The  address  loads  the  first  stored  address  (that  of 
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A)  onto  the  computer's  MA  lines. 
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2)  The  contents  of  the  first  location  in  the  output 
register  are  loaded  into  the  addressed  location 
in  memory. 

3)  The  above  steps  are  then  repeated  until  all  four 
data  words  are  loaded  into  memory. 

At  this  time,  memory  control  is  returned  to  the  pro- 
cessor and  the  program  continues  from  the  point  where  the 
data  break  occurred. 

The  schematic  diagrams  of  the  address  register,  output 
register,  input  register  and  ALU  can  be  found  in  Appendix  E. 
Also  included  are  diagrams  of  buffers  for  one  set  of  input 
lines  and  output  lines  on  both  the  ALU  and  multiplier, 
diagrams  of  the  internal  register  addressing  counters,  of 
the  device  selection  logic,  and  of  the  logic  which  activates 
the  DMA  state  in  the  processor. 

The  multiplier  is  an  array  of  individual  2x4  bit 
multiplier  chips  arranged  to  produce  a 24  bit  product  from 
two  twelve  bit  operands.  As  it  operates  with  two’s  comple- 
ment numbers  there  is  no  need  to  check  for  the  sign  of  an 
operand  before  multiplication.  Thus,  the  design  of  the 
hardware  is  simplified  considerably. 

The  Arithmetic  Logic  Unit  is  a simple  three  chip 
device  which  must  be  capable  of  both  addition  and  sub- 
traction, the  choice  being  made  with  one  control  line.  It 
too  is  a twelve  bit  two's  complement  device.  As  both  the 
ALU  and  multiplier  devices  arc  combinatorial  logic  devices, 
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no  clocking  is  required  for  either.  However,  as  there  is 
no  clock,  there  must  exist  a method  of  holding  data  on  the 
outputs  so  a change  on  one  of  the  inputs  will  not  affect  the 
desired  answer.  In  order  to  read  the  outputs  into  the  input 
register  (as  described  earlier),  the  data  at  these  outputs 
will  necessarily  he  reapplied  to  the  inputs.  The  conse- 
quences of  not  providing  for  an  output  latch  could  therefore 
be  disastrous.  Similarly,  as  both  inputs  to  the  ALU  (as  well 
as  the  multiplier)  originate  with  the  input  register,  a 
latch  must  be  provided  on  one  input  line  which  will  hold  the 
proper  data  word  while  the  other  data  word  is  being  read 
from  t he  input  register. 

The  aforementioned  latches  must  each  be  12  bits  wide. 

In  addition,  the  clock  should  be  edge  triggerable,  in  order 
to  avoid  timing  ambiguities  which  might  otherwise  arise  as 
a result  of  too  long  (or  too  short)  a clock  pulse  used  with 
level  triggering. 

I.ikewisc,  the  input,  output,  and  addressing  registers 
must  all  be  twelve  bits  wide.  As  they  are  actually  memories, 
however,  and  not  simple  latches,  control  signals  are  re- 
quired for  reading  from,  writing  into,  and  addressing  each 
of  the  registers.  In  order  to  read  or  write,  the  proper 
control  line  must  be  made  low.  On  the  other  hand,  the 
addressing  is  best  performed  by  counters.  Both  the  output 
and  addressing  registers  arc  addressed  sequentially  from 
0-3.  I he  address  sequence  for  the  input  register  is  not 


— - — 


I 

I 

I 


t 

I 


J 

!J 

0 

Q 

] 

'] 


39 


I 


1 

i 


r < 


l 


M 


quite  so  simple  (Figure  12).  The  most  convenient  sequence 
yet  contrived  results  in  the  input  register  holding  the 
following  words,  in  order  from  left  to  right. 


Word 

0 

1 

2 

3 

4 

5 


(A-C)X 


A 
B 
X 
Y 

C (A-C) 

D (B-D) 


(A-C)Y 


(B-D)X 


(B-D) Y 


where  A,  B,  X,  Y,  C,  and  D are  written  in  order  from  top  to 
bottom. 


000 

001 

010 

Oil 

100 

101 

000 

100* 

001 

101* 

100 

010 

000 

011 

001 

101 

010* 

Oil* 

000 

Oil 

001 

010 


* indicates  that  these  addresses 
occur  two  times  in  succession. 


Figure  12:  Input  register  address  sequence. 


As  can  be  seen  from  Figure  12,  the  use  of  a simple 
counter  utilizing  J-K  flip-flops  is  not  feasible.  However 


a 5 bit  counter  can  be  used  to  address  a ROM  sequentially 
such  that  the  ROM  output  at  each  address  corresponds  to  the 
proper  3 bit  code  of  Figure  12.  The  ROM  address  would  be 
set  to  00000  at  the  start  of  every  pass  through  the  device. 
The  counter  controlling  the  ROM  address  would  be  clocked 
every  time  a new  location  within  the  input  register  is  to  be 
accessed.  The  ROM  (actually  a PROM)  would  be  by  far  the 
most  economical  approach,  in  terms  of  both  money  and  hard- 
ware, as  fewer  components  would  be  required  with  a ROM  than 
without  one. 

It  also  appears  that  rather  than  suffer  the  design  of 
a complicated  hardware  logic  system  to  control  the  indi- 
vidual components  within  the  FFT  hardware,  a ROM  should  be 
incorporated.  Again,  the  ROM  would  be  addressed  sequentially 
by  a counter  which  would  be  clocked  every  time  another 
"instruction"  was  desired.  In  this  instance,  however,  in 
order  to  ensure  that  all  the  outputs  of  the  ROM  change  at 
the  same  time  with  a change  in  address,  the  outputs  should 
ip  clocked  into  a latch.  This  clock  would  best  be  con- 
trolled by  a time  delay  device  (one-shot  multivibrator)  so 
that  enough  time  is  allowed  for  the  outputs  to  change  state 
as  necessary. 

Hue  to  the  limited  time  left  for  research,  the  control 
of  external  components  will  have  to  be  investigated  at  a 
later  date.  It  appears  at  this  point  that  the  idea  of  an 
external  FIT  hardware  device  is  quite  feasible.  Once 
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constructed,  the  PDP-8/E's  only  functions  will  be: 

1)  indexing 

2)  data  storage 

3)  data  acquisition 

4)  display. 

In  the  future,  the  last  four  functions  of  the  mini- 
computer listed  above  could  conceivably  be  taken  over  by  a 
second  set  of  hardware,  most  likely  controlled  by  a super 
high-speed  microprocessor.  In  this  manner,  an  entire  real- 
time FFT  system  could  be  constructed.  This  system  would  be 
entirely  self-contained  and  relatively  inexpensive  (in 
comparison  with  the  PDP-8/E).  Hopefully  such  a design  will 
be  undertaken  by  this  author  within  the  next  several  years. 
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APPENDIX  A 


Following  are  a series  of  photographs  of  the  spectral 
density  of  a group  of  input  wave  forms  developed  by  the 
Discrete  Fourier  Transform.  Each  photograph  has  a width  of 
512  spectral  lines  representing  512  Hz. 


Figure  A- 1 : DFT  of  a 

180  Hertz  10  volt  pea 
with  1/32  of  a second 
volt  peak-to-p 


1/32  of  a second  of  a 
-to-peak  sine  wave 
of  a 300  Hertz  10 
ak  sine  wave. 


J • J UUHPUl 


Figure  A- 2:  Amplitude  expanded  view  of  the  waveform 

described  in  Figure  A-l.  Note  the  effect 
on  the  180  Hertz  component. 


Figure  A-2:  DFT  of  a 100  Hertz  sinusoidal  input 

waveform  buried  approximately  10  dB  down  in  "white"  noise. 
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APPENDIX  B 

The  Fast  Fourier  Transform,  as  described  previously, 
divides  a set  of  time  sampled  data  points  into  a series  of 
DFTs.  The  following  photographs  demonstrate  the  manner  in 
which  these  vector  transformations  occur.  Each  photograph 
has  a width  of  S12  points  and  is  scaled  at  five  volts/cm. 
The  top  trace  in  each  photograph  corresponds  to  the  real 
part  of  the  complex  frequency  components,  the  bottom  trace 
corresponds  to  the  imaginary  part  (unless  otherwise  noted) 


Figure  B-l:  A one-cycle  square-wave 

upon  which  the  FFT  was  performed. 


| . 


Figure  B-2:  The  contents  of  the  real  and  imaginary 

vectors  after  the  first  FFT  vector  transformation. 
Note  the  correlation  with  the  COS  (top)  and  with 
the  SIN  (bottom) . 


Figure  B-3:  The  results  of  the  second  FFT  vector  trans- 

formation. Note  the  division  of  the  two  distinct  forms  in 
B-2  into  four  distinct  forms.  Also  note  the  decrease  in 
amplitude.  The  small  points  approximately  1 cm  below  the 
zero  levels  are  images  from  the  last  point  in  the  respective 

vectors . 


Figure  B-4:  The  third  FFT  vector  transformation 

Again  note  the  decreasing  amplitude. 


Figure  B-5:  Note  how  the  top  trace  still 

shows  correlation  with  the  COS,  and  the  bottom 
trace  with  the  SIN. 


Figure  B-8:  After  the  seventh  FFT  vector  trans- 

formation, the  only  distinguishable  point  is  what  corresponds 
to  the  first  spectral  line  in  the  bottom  trace  5 cm  from 

the  left  edge. 


Figure  B-9:  The  eighth  vector  transformation  shows 

that  the  real  terms  arc  virtually  equal  to  zero. 


Figure  B - 1 0 : The  final  step  shows  zero  correlation 

with  the  COS  and  no  even  harmonics  (the  left  half  of  the 

photograph) . 


Figure  B- 1 1 : This  photograph  shows,  on  the  bottom, 

the  magnitudes  of  the  complex  frequency  components.  The 
top  trace  shows  the  bit-reversed  magnitudes  of  the  components 

the  bottom  trace. 
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Figure  B - 1 2 : An  amplitude  expanded  view  of  the 

top  trace  in  Figure  B-ll.  Note  the  1:1/3,  1:1/5,  etc. 
magnitudes  of  the  first,  third,  fifth,  etc.,  harmonics. 
The  even  harmonics  are  zero.  Also  note  the  reflection  of 
the  first  256  spectral  lines  which  appears  on  the  right 
half  of  the  photograph.  If  the  photograph  were  folded  on 
the  5 cm  line,  the  two  ends  would  match  up  perfectly  due 
to  this  "mirror  imaging." 


APPENDIX  C 


The  following  series  of  photographs  demonstrate  the 
applicability  of  the  FFT  developed.  Shown  are  the  FFT's  of 
several  input  waveforms.  Each  photograph  has  a width  of 
256  spectral  lines,  each  line  has  a bandwidth,  or  resolu- 
tion, of  eight  Hertz. 


Figure  C-l:  A 100  Hertz,  10  volt  peak-to-peak 

sinusoidal  input  results  in  this  spectral  distribution  with 

the  FFT. 
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Figure  04:  The  same  submarine  as  in  Figure  0 3, 

recorded  on  audio  quality  tape  recorder  at  7 1/2  ips 
Note  the  increase  in  noise  level. 


Figure  03:  Fish  noise  combined  with  the  submarine 

in  Figure  0 3 front  the  III’  tape  recorder.  The  output  level 
from  the  tape  had  to  be  reduced  in  order  to  avoid  amplitude 
distortion.  Thus  the  submarine  cannot  be  seen. 
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APPENDIX  D 

Included  in  this  appendix  are  the  programs  written 
for  the  PDP-8/E  minicomputer  in  the  development  of  the  FFT . 
The  first  program  is  the  FFT  itself.  The  second  is  the 
data  acquisition  set  of  software.  Finally,  the  programs 
which  find  the  magnitude  of  the  complex  results  and  reorders 
these  results  is  presented. 

The  FFT  starts  at  location  200. 

The  data  acquisition  routine  starts  at  location  6400. 
The  sliding  data  window  begins  at  location  3700;  and  the 
window  loader  starts  at  location  6600. 

The  program  which  calculates  magnitudes  begins  at 
location  6000.  The  bit-reversal  routine  starts  at  location 
3600. 
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DATAROUT 


3700 


3700 

621  1 

3701 

1343 

3702 

3344 

3703 

7604 

3704 

1345 

3705 

1343 

3706 

7421 

3707 

7701 

3710 

7440 

371  1 

5324 

3712 

134  6 

3713 

6551 

3714 

7000 

3715 

7000 

3716 

7000 

3717 

7000 

3720 

7000 

3721 

7000 

3722 

7200 

3723 

6551 

3724 

3345 

3725 

1346 

3726 

6553 

3727 

7000 

3730 

7000 

3731 

7200 

3732 

6553 

3733 

1745 

3734 

6552 

3735 

7200 

3736 

2345 

3 « «.  ' 

7000 

3740 

2344 

3741 

5333 

3742 

5301 

3743 

7000 

3744 

0000 

3745 

0000 

3746 

2000 

6400 

6400 

621  1 

6401 

7200 

6402 

324  7 

6403 

7240 

6404 

7100 

6405 

6130 

6406 

7200 

6407 

1245 

6410 

6132 

100 

*3700 

110 

6211 

120 

TOPI 

TAD 

K 7000 

130 

DC  A 

TALLY 

140 

LAS 

150 

TAD 

POINT 

160 

TAD 

K 7000 

170 

MQL 

180 

ACL 

1 90 

SZA 

200 

JMP 

OUT 

210 

TAD 

K 2000 

220 

6551 

230 

NOP 

NOP 

NOP 

NOP 

NOP 

NOP 

CLA 

235 

6551 

240 

OUT 

DC  A 

POINT 

250 

TAD 
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APPENDIX  E 

This  appendix  presents  that  logic  designed  for  the  FFT 
hardware  device  discussed  in  Chapter  4.  Note  that  no  con- 
trolling logic  has  been  included,  only  the  registers  and  the 
like  have.  The  multiplier  matrix  is  also  not  included.  For 
a description  of  this  matrix  the  reader  is  directed  to 
Fairchild  Application  Note  329. 


ALU  A LINE  BUFFER  (+  MULT  x LINE  BUFF) 


COMPLEMENT  ALU  (ACTIVE-LOW) 


OUTPUT  REGISTER 


FROM  ALU 


aaiaiMttmiiahmM-,.-- 


UNCLASSIFIED 

Security  Classification 

— - - - DOCUMENT  CONTROL  DATA  - R & D 

1 Originating  ACTIVITY  (Corporate  author)  ^ 2a.  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


U.  S.  Naval  Academy , Annapolis. 


26.  GROUP 


I 3 REPORT  TITLE 


Frequency  analysis  using  a minicomputer. 

A OESCRiptive  NOTES  (Type  of  report  end  inclusive  dates) 

Research  report. 

5 A u THORUI  (First  name,  middle  initial,  last  name) 


D.  C.  Boch. 

0 REPOR T DATE 

17  May  1976. 

Sa  CONTRACT  OR  GRANT  NT 


6.  P ROJ  EC  T NO 


7*.  TOTAL  NO  OF  PAGES  7b.  NO  OF  REFS 

81  3 

9a.  ORIGINATOR’S  REPORT  NUMBER(S) 

U.  S.  Naval  Academy,  Annapolis  - Trident 
Scholar  project  report,  no.  75  (1976) 

9b.  OTHER  REPORT  NO(S>  (Any  other  numbers  that  may  be  assigned 
this  report) 


MO  DISTRIBUTION  STATEMENT 

1 

This  document  has  been  approved  for  public  release  j 
its  distribution  is  UNLIMITED. 

11  SUPPLEMENT ARV  NOTES 

12  SPONSORING  MILITARY  ACTIVITY 

13  ABSTRAC  T 

U. S.  Naval  Academy,  Annapolis. 

This  study  was  initiated  with  the  aim  of  determining  the  fundamental 
capabilities  and  limitations  of  one  minicomputer  in  the  development  of  a spectral 
Dlot  through  Discrete  Fourier  Transfom (DFT).  Included  within  this  general  topic 
» 're  two  fundamental  goals  : 

1/  Development  of  the  fastest  FFT  (Fast  Fourier  Transform)  possible 
utilizing  only  the  basic  capabilities  of  the  PDP-8/E  general  purpose  minicom- 
puter. Successful  completion  of  this  objective  brought  the  machnine's  fundamental 
limitations  for  real-time  spectral  analysis  into  clear  perspective. 

2/  Investigation  of  the  feasibility  of  an  external  hardware  "add-on" 
which  would  considerable  increase  the  spectral  analysis  capabilities  of  the  com- 
puter. Proving  worthwhile,  the  design  of  such  hardware  was  launched  but  not 
completed  - due  to  time  restrictions. 

As  a result  of  this  study,  it  became  apparent  that  all  three  systems 
(DFT,  FFT,  FFT  hardware)  are  capable  of  operation  in  real  time.  However,  the 
upper  frequency  limit  for  the  DFT  would  be  approximately  50  Hz,  for  the  FFT 
500  Ha,  and  for  the  FFT  hardware  2500  Ha. 


DD  ,'”".“..1473  IP4GE 


S/N  0101. 807. 6801 


UNCLASSIFIED. 

Security  Classification 


